The first steps in the Human Genome Project are to develop the needed technologies, then to "map" and to "sequence" the genome. But in a sense, these well-publicized efforts aim only to provide the raw material for the next, longer strides. The ultimate goal is to exploit those resources for a truly profound molecular-level understanding of how we develop from embryo to adult, what makes us work, and what causes things to go wrong. In the offing is a new era of molecular medicine characterized not by treating symptoms, but rather by looking to the deepest causes of disease. Rapid and more accurate diagnostic tests will make possible earlier treatment for countless maladies. Even more promising, insights into genetic susceptibilities to disease and to environmental insults, coupled with preventive therapies, will thwart some diseases altogether. New, highly targeted pharmaceuticals, not just for heritable diseases, but for communicable ailments, as well, will attack diseases at their molecular foundations. And even gene therapy will become possible, in some cases actually "fixing" genetic errors. All of this in addition to a new intellectual perspective on who we are and where we came from.
But how is it possible, with the incredible diversity of the world's five and a half billion people, to determine an "average" genome that can be considered even reasonably accurate? The answer lies in an understanding that DNA, with its millions of base pairs, is not the workhorse of the human body -- proteins are -- and that the body is built and run with fewer than 100,000 different kinds of protein molecules. For each of these proteins, we can imagine a single corresponding gene composed of many base pairs (though there is sometimes some redundancy), whose job it is to ensure an adequate and timely supply of the structural or regulatory materials. In a very real sense, then, all of the subtlety of our species, all of our art and science, is ultimately accounted for by a surprisingly small set of discrete genetic instructions. More surprising still, the differences between two unrelated individuals, between the man next door and Mozart, for example, may reflect a mere handful of differences in their genomic recipes -- perhaps one altered word in five hundred. We are far more alike than we are different. At the same time, there is room for near-infinite variety.
It is no overstatement to say that to decode our 100,000 genes in some fundamental way would be an epochal step toward unraveling the manifold mysteries of life û but only if we understand what those genes produce and how they regulate that production. It is the difference between seeing a string of letters on a page and recognizing which of those letters actually form words, and which of those words form sentences. Meaningful information only derives from the relationship between the words to form discrete ideas.
The human genome is the full complement of genetic material in a human cell. The genome, in turn, is distributed among 23 sets of chromosomes, which, in each of us, have been replicated and re-replicated since the fusion of sperm and egg that marked our conception. The source of our personal uniqueness, our full genome, is therefore preserved in each of our body's several trillion cells. At a more basic level, the genome is DNA, deoxyribonucleic acid, a natural polymer built up of repeating nucleotides, each consisting of a simple sugar, a phosphate group, and one of four nitrogenous bases. In the chromosomes, two DNA strands are twisted together into an entwined spiral -- the famous double helix -- held together by weak bonds between complementary bases, adenine (A) to thymine (T) and cytosine to guanine (C-G); structurally the molecule resembles a twisted ladder. In the language of molecular genetics, each of these linkages constitutes a base pair. All told, if we count only one of each pair of chromosomes, the human genome comprises about three billion base pairs.
The specificity of these base-pair linkages underlies all that is wonderful about DNA. First, replication becomes straightforward. Unzipping the double helix provides unambiguous templates for the synthesis of daughter molecules: One helix begets two with near-perfect fidelity. Second, by a similar template-based process, a means is also available for producing a DNA-like messenger known as messenger @RNA (or @mRNA). This faithful complement of a particular DNA segment transports its information to the cell's cytoplasm where it directs the synthesis of a particular protein. Many subtleties are entailed in the synthesis of proteins, but in a schematic sense, the process is elegantly simple.
Every protein is made up of one or more polypeptide chains, each a series of (typically) several hundred molecules known as amino acids, linked by so-called peptide bonds. Remarkably, only 20 different kinds of amino acids suffice as the building blocks for all human proteins in nature. The synthesis of a protein chain, then, is simply a matter of specifying a particular sequence of amino acids. Each linear sequence of three bases (both in RNA and in DNA) corresponds uniquely to a single amino acid. The RNA sequence AAU (RNA uses the base uracil instead of thymine) thus dictates that the amino acid asparagine should be added to a polypeptide chain, GCA specifies alanine -- and so on. A segment of the chromosomal DNA that directs the synthesis of a single type of protein constitutes a single gene.
As we have seen before, one of the central goals of the Human Genome Project is to produce a detailed "map" of the human genome. But, just as there are topographic maps and political maps and highway maps of the United States, so there are different kinds of genome maps.